I collected cheese price data from the United States Department of Agriculture (USDA). It ranges from 2000 to 2022, but structure of it changed after 2018. From 2000 to 2018, data was aggregated in one sheet. After 2018, the USDA created an API that recorded each year’s data individually for each kind of cheese. I’ll need to merge these files together.
One thing to note is that I’ll be focusing on the import (not domestic) prices for this cheese to see how it trends over time. I also want to note that the price is in dollars per pound. You can also check out these FAQs about USDA market data.
I’m loading the tidyverse library and lubridate package so I can clean up my data.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.0 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(dplyr)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
I’ll import my first file here, which ranges from January 2000 to February 2018. I’ll filter it so I’m looking at just the domestic cheese. I’ll also remove repeated rows.
Then, I’ll fix my dates so R reads them as dates.
thru_2018_clean <- read_csv("data-raw/thru_2018.csv") |>
distinct() |> #I'm using this to remove duplicate rows
clean_names() |>
filter(
region != "FOREIGN" #I don't want the foreign cheese
) |>
mutate(
date = mdy(date)
) |>
select(
date, type, high_price, low_price, region)
## Rows: 48237 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Date, Region, Commodity, Type
## dbl (8): Report Number, Report Year, Low Price, High Price, Imported Low Pri...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
thru_2018_clean |> glimpse()
## Rows: 16,671
## Columns: 5
## $ date <date> 2000-01-07, 2000-01-07, 2000-01-07, 2000-01-07, 2000-01-07…
## $ type <chr> "CHEDDAR 10# PRINTS", "CHEDDAR 40# BLOCK", "CHEDDAR SINGLE …
## $ high_price <dbl> 1.6725, 1.4850, 1.6225, 2.5500, 1.6675, 1.5175, 1.6225, 2.1…
## $ low_price <dbl> 1.1825, 1.3075, 1.1625, 2.3500, 1.3650, 1.3650, 1.3850, 1.8…
## $ region <chr> "NORTHEAST", "NORTHEAST", "NORTHEAST", "NORTHEAST", "NORTHE…
Now I’ll import all the sheets that came after 2018. I put the midwest, northeast and west files for each year following 2018 in a folder called file-list and import them all here.
list_of_files <- list.files(path = "file-list", recursive = TRUE,
pattern = "\\.csv$",
full.names = TRUE) #imports the file list
cheese_dirty <- list_of_files |>
purrr::set_names(nm = (basename("file-list") |> tools::file_path_sans_ext())
) |>
purrr::map_df(read_csv,
col_names = TRUE) #sets names
## Rows: 397 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (31): group, category, report begin date, report end date, published dat...
## dbl (3): slug id, price min, price max
## lgl (3): lot Desc, mostly low price, mostly high price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 408 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (31): group, category, report begin date, report end date, published dat...
## dbl (3): slug id, price min, price max
## lgl (3): lot Desc, mostly low price, mostly high price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 416 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (31): group, category, report begin date, report end date, published dat...
## dbl (3): slug id, price min, price max
## lgl (3): lot Desc, mostly low price, mostly high price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 416 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (31): group, category, report begin date, report end date, published dat...
## dbl (3): slug id, price min, price max
## lgl (3): lot Desc, mostly low price, mostly high price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 352 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (31): group, category, report begin date, report end date, published dat...
## dbl (3): slug id, price min, price max
## lgl (3): lot Desc, mostly low price, mostly high price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 217 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (31): group, category, report begin date, report end date, published dat...
## dbl (3): slug id, price min, price max
## lgl (3): lot Desc, mostly low price, mostly high price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 208 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (30): group, category, report begin date, report end date, published dat...
## dbl (3): slug id, price min, price max
## lgl (4): lot Desc, origin, mostly low price, mostly high price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 208 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (30): group, category, report begin date, report end date, published dat...
## dbl (3): slug id, price min, price max
## lgl (4): lot Desc, origin, mostly low price, mostly high price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 208 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (30): group, category, report begin date, report end date, published dat...
## dbl (3): slug id, price min, price max
## lgl (4): lot Desc, origin, mostly low price, mostly high price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 176 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (30): group, category, report begin date, report end date, published dat...
## dbl (3): slug id, price min, price max
## lgl (4): lot Desc, origin, mostly low price, mostly high price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 275 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (31): group, category, report begin date, report end date, published dat...
## dbl (3): slug id, price min, price max
## lgl (3): lot Desc, mostly low price, mostly high price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 260 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (30): group, category, report begin date, report end date, published dat...
## dbl (3): slug id, price min, price max
## lgl (4): lot Desc, origin, mostly low price, mostly high price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 260 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (30): group, category, report begin date, report end date, published dat...
## dbl (3): slug id, price min, price max
## lgl (4): lot Desc, origin, mostly low price, mostly high price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 260 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (30): group, category, report begin date, report end date, published dat...
## dbl (3): slug id, price min, price max
## lgl (4): lot Desc, origin, mostly low price, mostly high price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 220 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (30): group, category, report begin date, report end date, published dat...
## dbl (3): slug id, price min, price max
## lgl (4): lot Desc, origin, mostly low price, mostly high price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
cheese_dirty